Search CORE

55 research outputs found

Model-Based Recognition and Parameter Estimation of Buildings from Multi-View Aerial Imagery Using Multi-Segmentation

Author: Houkes Z.
Schutte K.
Schutte Klamer
Spreeuwers Lieuwe Jan
Publication venue: International Society for Photogrammetry and Remote Sensing (ISPRS)
Publication date: 01/01/1997
Field of study

This paper describes a system for analysis of aerial images of urban areas using multiple images from different viewpoints. In this paper the emphasis is on the discussion of the experimental evaluation using segmented images obtained by applying 3 different parameters in the segmentation-process. The proposed approach combines bottom-up and top-down processing. To evaluate statistically the performance of the system, a set of 50 realisations of 5 images from different viewpoints was used, which was generated by combining real and ray-traced images. The experiments show a significant improvement of reliability and accuracy if multi-segmentation is used in multi-view imagery, instead of single-segmentation

University of Twente Research Information

Are current long-term video understanding datasets long-term?

Author: Schutte Klamer
Strafforello Ombretta
van Gemert Jan
Publication venue
Publication date: 22/08/2023
Field of study

Many real-world applications, from sport analysis to surveillance, benefit from automatic long-term action recognition. In the current deep learning paradigm for automatic action recognition, it is imperative that models are trained and tested on datasets and tasks that evaluate if such models actually learn and reason over long-term information. In this work, we propose a method to evaluate how suitable a video dataset is to evaluate models for long-term action recognition. To this end, we define a long-term action as excluding all the videos that can be correctly recognized using solely short-term information. We test this definition on existing long-term classification tasks on three popular real-world datasets, namely Breakfast, CrossTask and LVU, to determine if these datasets are truly evaluating long-term recognition. Our study reveals that these datasets can be effectively solved using shortcuts based on short-term information. Following this finding, we encourage long-term action recognition researchers to make use of datasets that need long-term information to be solved

arXiv.org e-Print Archive

Video BagNet: short temporal receptive fields increase robustness in long-term action recognition

Author: Liu Xin
Schutte Klamer
Strafforello Ombretta
van Gemert Jan
Publication venue
Publication date: 22/08/2023
Field of study

Previous work on long-term video action recognition relies on deep 3D-convolutional models that have a large temporal receptive field (RF). We argue that these models are not always the best choice for temporal modeling in videos. A large temporal receptive field allows the model to encode the exact sub-action order of a video, which causes a performance decrease when testing videos have a different sub-action order. In this work, we investigate whether we can improve the model robustness to the sub-action order by shrinking the temporal receptive field of action recognition models. For this, we design Video BagNet, a variant of the 3D ResNet-50 model with the temporal receptive field size limited to 1, 9, 17 or 33 frames. We analyze Video BagNet on synthetic and real-world video datasets and experimentally compare models with varying temporal receptive fields. We find that short receptive fields are robust to sub-action order changes, while larger temporal receptive fields are sensitive to the sub-action order

arXiv.org e-Print Archive

Linear color correction for multiple illumination changes and non-overlapping cameras

Author: Bouma Henri
Menendez Garcia Jose Manuel
Schutte Klamer
Torres Arjona Juan
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2015
Field of study

Many image processing methods, such as techniques for people re-identification, assume photometric constancy between different images. This study addresses the correction of photometric variations based upon changes in background areas to correct foreground areas. The authors assume a multiple light source model where all light sources can have different colours and will change over time. In training mode, the authors learn per-location relations between foreground and background colour intensities. In correction mode, the authors apply a double linear correction model based on learned relations. This double linear correction includes a dynamic local illumination correction mapping as well as an inter-camera mapping. The authors evaluate their illumination correction by computing the similarity between two images based on the earth mover's distance. The authors compare the results to a representative auto-exposure algorithm found in the recent literature plus a colour correction one based on the inverse-intensity chromaticity. Especially in complex scenarios the authors’ method outperforms these state-of-the-art algorithms

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Incremental concept learning with few training examples and hierarchical classification

Author: Azzopardi George
Bouma Henri
Burghouts Gertjan J.
Eendebak Pieter T.
Schutte Klamer
SPIE - The International Society for Optical Engineering
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2015
Field of study

Object recognition and localization are important to automatically interpret video and allow better querying on its content. We propose a method for object localization that learns incrementally and addresses four key aspects. Firstly, we show that for certain applications, recognition is feasible with only a few training samples. Secondly, we show that novel objects can be added incrementally without retraining existing objects, which is important for fast interaction. Thirdly, we show that an unbalanced number of positive training samples leads to biased classi er scores that can be corrected by modifying weights. Fourthly, we show that the detector performance can deteriorate due to hard-negative mining for similar or closely related classes (e.g., for Barbie and dress, because the doll is wearing a dress). This can be solved by our hierarchical classi cation. We introduce a new dataset, which we call TOSO, and use it to demonstrate the e ectiveness of the proposed method for the localization and recognition of multiple objects in images.This research was performed in the GOOSE project, which is jointly funded by the enabling technology program Adaptive Multi Sensor Networks (AMSN) and the MIST research program of the Dutch Ministry of Defense. This publication was supported by the research program Making Sense of Big Data (MSoBD).peer-reviewe

OAR@UM

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Interactive detection of incrementally learned concepts in images with ranking and semantic query interpretation

Author: 13th International Workshop on Content-Based Multimedia Indexing (CBMI)
Azzopardi George
Boer Maaike de
Bouma Henri
Brandt Paul
Daniele Laura
Endebak Pieter
Koot Gijs
Kruithof Maarten
Sappelli Maya
Schavemaker John
Schutte Klamer
Spitters Martijn
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2015
Field of study

This research was performed in the GOOSE project, which is jointly funded by the MIST research program of the Dutch Ministry of Defense and the AMSN enabling technology program.The number of networked cameras is growing exponentially. Multiple applications in different domains result in an increasing need to search semantically over video sensor data. In this paper, we present the GOOSE demonstrator, which is a real-time general-purpose search engine that allows users to pose natural language queries to retrieve corresponding images. Top-down, this demonstrator interprets queries, which are presented as an intuitive graph to collect user feedback. Bottomup, the system automatically recognizes and localizes concepts in images and it can incrementally learn novel concepts. A smart ranking combines both and allows effective retrieval of relevant images.peer-reviewe

OAR@UM

Crossref

University of Groningen

TNO at TRECVID 2013 : multimedia event detection and instance search

Author: Antwerpen Gert van
Azzopardi George
Baan Jan
Boer Maaike de
Bouma Henri
Brandt Paul
Broekhuijsen Jeroen
Daniele Laura
Eekeren Adam van
Eendebak Pieter T.
Haar Frank ter
Hollander Richard den
Hove Johan-Martijn ten
Huis Jasper van
Kraaij Wessel
Schavemaker John
Schutte Klamer
Spitters Martijn
TRECVID 2013
Versloot Corne
Wit Joost de
Zon Remco van der
Publication venue: TRECVID
Publication date: 01/01/2013
Field of study

We describe the TNO system and the evaluation results for TRECVID 2013 Multimedia Event Detection (MED) and instance search (INS) tasks. The MED system consists of a bag-of-word (BOW) approach with spatial tiling that uses low-level static and dynamic visual features, an audio feature and high-level concepts. Automatic speech recognition (ASR) and optical character recognition (OCR) are not used in the system. In the MED case with 100 example training videos, support-vector machines (SVM) are trained and fused to detect an event in the test set. In the case with 0 example videos, positive and negative concepts are extracted as keywords from the textual event description and events are detected with the high-level concepts. The MED results show that the SIFT keypoint descriptor is the one which contributes best to the results, fusion of multiple low-level features helps to improve the performance, and the textual event-description chain currently performs poorly. The TNO INS system presents a baseline open-source approach using standard SIFT keypoint detection and exhaustive matching. In order to speed up search times for queries a basic map-reduce scheme is presented to be used on a multi-node cluster. Our INS results show above-median results with acceptable search times.This research for the MED submission was performed in the GOOSE project, which is jointly funded by the enabling technology program Adaptive Multi Sensor Networks (AMSN) and the MIST research program of the Dutch Ministry of Defense. The INS submission was partly supported by the MIME project of the creative industries knowledge and innovation network CLICKNL.peer-reviewe

OAR@UM

A model driven approach to extract buildings from multi-view aerial imagery

Author: Houkes Z.
Schutte K.
Schutte Klamer
Spreeuwers Lieuwe Jan
Publication venue: Birkhäuser
Publication date: 01/01/1997
Field of study

University of Twente Research Information